{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "74ae54e0-1b18-4389-b50d-29eb208bf79f",
   "metadata": {},
   "source": [
    "# Accessing stored vector parts directly\n",
    "\n",
    "The Query interface is able to return vector-parts based on our needs. This can be useful for numerous reasons:\n",
    "- testing and monitoring in general\n",
    "- building logic on the vectors themselves\n",
    "- and many more."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "dd20a65d-bc10-404e-bb12-4be0af0cbfde",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install superlinked==37.5.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "023e8d69-be05-4ac8-9513-d9dcacb0193a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from superlinked import framework as sl"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6bc9bfa7-5302-4a93-9f16-ce54113aea37",
   "metadata": {},
   "source": [
    "Let's create a simple config and ingest two entities of text just for demonstration purposes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "3a35be4b-c014-49ce-a058-871c6ad35fc7",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Paragraph(sl.Schema):\n",
    "    id: sl.IdField\n",
    "    body: sl.String\n",
    "    like_count: sl.Integer\n",
    "\n",
    "\n",
    "paragraph = Paragraph()\n",
    "\n",
    "body_space = sl.TextSimilaritySpace(text=paragraph.body, model=\"sentence-transformers/all-mpnet-base-v2\")\n",
    "like_space = sl.NumberSpace(number=paragraph.like_count, min_value=0, max_value=100, mode=sl.Mode.MAXIMUM)\n",
    "# indices can be built on top of multiple spaces as simple as that\n",
    "paragraph_index = sl.Index([body_space, like_space], fields=paragraph.like_count)\n",
    "\n",
    "source: sl.InMemorySource = sl.InMemorySource(paragraph)\n",
    "executor = sl.InMemoryExecutor(sources=[source], indices=[paragraph_index])\n",
    "app = executor.run()\n",
    "\n",
    "source.put(\n",
    "    [\n",
    "        {\n",
    "            \"id\": \"paragraph-1\",\n",
    "            \"body\": \"Glorious animals live in the wilderness.\",\n",
    "            \"like_count\": 75,\n",
    "        },\n",
    "        {\n",
    "            \"id\": \"paragraph-2\",\n",
    "            \"body\": \"Growing computation power enables advancements in AI.\",\n",
    "            \"like_count\": 10,\n",
    "        },\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0dff421-bbd7-42b0-917d-7f90e59c4706",
   "metadata": {},
   "source": [
    "The vector parts are contained in the `Result`'s metadata - it can be switched on and off via the `metadata` argument in any `.select` clause. If you'd like to learn more about `.select` clauses, please refer to the [querying_options notebook](https://github.com/superlinked/superlinked/blob/main/notebook/feature/querying_options.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7b431863-7e47-4a76-9fa0-44cade861fe9",
   "metadata": {},
   "outputs": [],
   "source": [
    "space_weights: dict[sl.Space, float] = {like_space: 1.0, body_space: 1.0}\n",
    "metadata_query = (\n",
    "    sl.Query(index=paragraph_index, weights=space_weights).find(paragraph).select_all(metadata=[body_space, like_space])\n",
    ")\n",
    "\n",
    "result = app.query(metadata_query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc2c3f7e-87fd-4faf-a020-577b12b47340",
   "metadata": {},
   "source": [
    "Note, that this is a [Result](https://github.com/superlinked/superlinked/blob/main/notebook/feature/query_result.ipynb) (coming from the same [query interface](https://github.com/superlinked/superlinked/blob/main/notebook/feature/querying_options.ipynb), just like any other - just with additional metadata."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "65c7b834-7385-4d8b-b3cb-cc5218675666",
   "metadata": {},
   "outputs": [],
   "source": [
    "first_entry = result.entries[0]\n",
    "second_entry = result.entries[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b2a72ea-75bf-4b7b-9577-04054a837af4",
   "metadata": {},
   "source": [
    "For the first entry, let's examine the `vector_parts` attribute of the metadata object. This is where our vectors are going to be.\n",
    "\n",
    "The first vector part is a 768 long text embedding of the `body attribute` - that is in line with the dimensionality of the embedding of the [model](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "64aef642-bb09-4bc8-84bd-2c299cbd5a68",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "768\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[0.012647623288110286, 0.05507237348049912, -0.023800908901327893]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(len(first_entry.metadata.vector_parts[0]))\n",
    "first_entry.metadata.vector_parts[0][:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4c3a272-49fc-4176-af89-ac996b9e66b9",
   "metadata": {},
   "source": [
    "The second vector part is our very own [number embedding](https://github.com/superlinked/superlinked/blob/main/notebook/feature/number_embedding_minmax.ipynb) of the `like` attribute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0ece00d7-f3ec-42c3-abce-385b312ee068",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0.6532814824381882, 0.2705980500730985, 0.0]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first_entry.metadata.vector_parts[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce6e8776-5c4f-4a9b-bc6a-245ccf8ba5d1",
   "metadata": {},
   "source": [
    "The same structure applies to the second entry in the `Result` as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "76c14627-4709-4608-9a88-52a43a7d74b3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "768\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[-0.012543628009221226, 0.055735641262102856, -0.034339140038615434]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(len(second_entry.metadata.vector_parts[0]))\n",
    "second_entry.metadata.vector_parts[0][:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "c83b6790-382f-4a0f-9e75-868c765b9100",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0.11061587104123714, 0.6984011233337103, 0.0]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "second_entry.metadata.vector_parts[1]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d3ec0944-ff4b-4ac9-a0a8-3f14c269bdc4",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "⚠️ Keep in mind that the vector part order in the metadata is <b>ALWAYS</b> following the space order when the index was created.\n",
    "</div>\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "5ec2ef78-6a50-4174-b74e-06d0dc07f6f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "alternative_order_query = (\n",
    "    sl.Query(index=paragraph_index, weights=space_weights).find(paragraph).select_all(metadata=[like_space, body_space])\n",
    ")  # switched order of the spaces\n",
    "\n",
    "result_alternative = app.query(alternative_order_query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "087f69d8-57a0-466d-88b6-47d963760dae",
   "metadata": {},
   "source": [
    "The space order in the vector parts inside metadata is unaffected by the order in which the input was supplied to the `metadata` argument of the `.select_all()` clause."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "b0a03ad8-9911-4908-94ed-b0d9747281b5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result_alternative.entries[0].metadata.vector_parts[0][:3] == first_entry.metadata.vector_parts[0][:3]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "superlinked-py3.11",
   "language": "python",
   "name": "superlinked-py3.11"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
